Robust, Light-weight Approaches to compute Lexical Similarity

نویسندگان

  • Quang Do
  • Dan Roth
  • Mark Sammons
  • Yuancheng Tu
چکیده

Most text processing systems need to compare lexical units – words, entities, semantic concepts – with each other as a basic processing step within large and complex systems. A significant amount of research has taken place in formulating and evaluating multiple similarity metrics, primarily between words. Often, such techniques are resourceintensive or are applicable only to specific use cases. In this technical report, we summarize some of our research work in finding robust, lightweight approaches to compute similarity between two spans of text. We describe two new measures to compute similarity, WNSim for word similarity, and NESim for named entity similarity, which in our experience have been more useful than more standard similarity metrics. We also present a technique, Lexical Level Matching (LLM), to combine such token-level similarity measures to compute phraseand sentence-level similarity scores. We have found LLM to be useful in a number of NLP applications; it is easy to compute, and surprisingly robust to

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A heuristic light robust approach to increase the quality of robust solutions

In this paper, the optimizations problems to seek robust solutions under uncertainty are considered. The light robust approach is one of the strong and new methods to achieve robust solutions under conditions of uncertainty. In this paper, we tried to improve the quality of the solutions obtained from the Light Robust method by introducing a revised approach. Considering the problem concerned, ...

متن کامل

A Hybrid Distributional and Knowledge-based Model of Lexical Semantics

A range of approaches to the representation of lexical semantics have been explored within Computational Linguistics. Two of the most popular are distributional and knowledgebased models. This paper proposes hybrid models of lexical semantics that combine the advantages of these two approaches. Our models provide robust representations of synonymous words derived from WordNet. We also make use ...

متن کامل

Robust and High Fidelity Mesh Denoising

This paper presents a simple and effective two stage mesh denoising algorithm, where in the first stage, the face normal filtering is done by using the bilateral normal filtering in the robust statistics framework. Tukey’s bi-weight function is used as similarity function in the bilateral weighting, which is a robust estimator and stops the diffusion at sharp edges which helps to retain feature...

متن کامل

Lexical Semantic Relatedness with Random Graph Walks

Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information fro...

متن کامل

Random Walks for Text Semantic Similarity

Many tasks in NLP stand to benefit from robust measures of semantic similarity for units above the level of individual words. Rich semantic resources such as WordNet provide local semantic information at the lexical level. However, effectively combining this information to compute scores for phrases or sentences is an open problem. Our algorithm aggregates local relatedness information via a ra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010